Phrase Extraction for Japanese Predictive Input Method as Post-Processing
نویسنده
چکیده
We propose a novel phrase extraction system to generate a phrase dictionary for predictive input methods from a large corpus. This system extracts phrases after counting n-grams so that it can be easily maintained, tuned, and re-executed independently. We developed a rule-based filter based on part-of-speech (POS) patterns to extract Japanese phrases. Our experiment shows usefulness of our system, which achieved a precision of 0.90 and a recall of 0.81, outperforming the N-gram baseline by a large margin.
منابع مشابه
Automatic Semantic Sequence Extraction from Unrestricted Non-Tagged Texts
Mophological processing, syntactic parsing and other useflfl tools have been proposed in the field of natural language processing(NLP). Many of those NLP tools take dictionary-based approaches. Thus these tools are often not very efficient with texts written in casual wordings or texts which contain m a w domain-specific terms, because of the lack of vocabulary. In this paper we propose a simpl...
متن کاملروش جدید متنکاوی برای استخراج اطلاعات زمینه کاربر بهمنظور بهبود رتبهبندی نتایج موتور جستجو
Today, the importance of text processing and its usages is well known among researchers and students. The amount of textual, documental materials increase day by day. So we need useful ways to save them and retrieve information from these materials. For example, search engines such as Google, Yahoo, Bing and etc. need to read so many web documents and retrieve the most similar ones to the user ...
متن کاملPost-Processing of Stream Flows in Switzerland with an Emphasis on Low Flows and Floods
Abstract: Post-processing has received much attention during the last couple of years within the hydrological community, and many different methods have been developed and tested, especially in the field of flood forecasting. Apart from the different meanings of the phrase “post-processing” in meteorology and hydrology, in this paper, it is regarded as a method to correct model outputs (predict...
متن کاملDo Heavy-NP Shift Phenomenon and Constituent Ordering in English Cause Sentence Processing Difficulty for EFL Learners?
Heavy-NP shift occurs when speakers prefer placing lengthy or “heavy” noun phrase direct objects in the clause-final position within a sentence rather than in the post-verbal position. Two experiments were conducted in this study, and their results suggested that having a long noun phrase affected the ordering of constituents (the noun phrase and prepositional phrase) by advanced Iranian EFL le...
متن کاملA Polynomial - Order Algorithm For Optimal Phrase Sequence Selection From A Phrase Lattice And Its Parallel Layered Implementation
This paper deals with a problem of selecting an optimal phrase sequence from a phrase lattice, which is often encountered in language processing such as word processing and post-processing for speech recognition. The problem is formulated as one of combina-torial optimization, and a polynomial order algorithm is derived. This algorithm finds an optimal phrase sequence and its dependency structu...
متن کامل